Skip to content

Conversation

@bramburn
Copy link
Contributor

@bramburn bramburn commented May 9, 2025

I've implemented a patch-based checkpoint system for Roo-Code that drastically reduces storage requirements while maintaining full functionality. Here's what I've done:

Created core classes for the patch-based checkpoint system:

  1. PatchCheckpointService: Main service that handles checkpoint operations
  2. PatchDatabase: SQLite database for storing checkpoint metadata
  3. PatchGenerator: Utility for generating and applying patches
  4. PatchCheckpointServiceFactory: Factory for creating service instances
  5. Updated the existing checkpoint system to use the new patch-based implementation:
  6. Modified src/core/checkpoints/index.ts to use PatchCheckpointServiceFactory instead of RepoPerTaskCheckpointService
  7. Updated Task.ts to use the new PatchCheckpointService type
  8. Created a migration system to convert existing Git-based checkpoints to the new patch-based format:
  9. MigrationService: Handles migrating existing checkpoints
  10. Added a command to trigger the migration: roo-cline.migrateCheckpoints
  11. Added tests to verify the implementation works correctly

Benefits of the New Implementation

  1. Drastically Reduced Storage Requirements: Instead of storing full copies of the workspace for each checkpoint, we now store only the differences between states.
  2. Improved Performance: The patch-based system is more efficient, especially for large workspaces.
  3. Better Scalability: The system can handle more checkpoints without excessive disk usage.
  4. Maintained Compatibility: The new system maintains the same API as the old one, so existing code that uses checkpoints will continue to work

Related GitHub Issue

Closes: #3080

Description

Test Procedure

Type of Change

  • 🐛 Bug Fix: Non-breaking change that fixes an issue.
  • New Feature: Non-breaking change that adds functionality.
  • 💥 Breaking Change: Fix or feature that would cause existing functionality to not work as expected.
  • ♻️ Refactor: Code change that neither fixes a bug nor adds a feature.
  • 💅 Style: Changes that do not affect the meaning of the code (white-space, formatting, etc.).
  • 📚 Documentation: Updates to documentation files.
  • ⚙️ Build/CI: Changes to the build process or CI configuration.
  • 🧹 Chore: Other changes that don't modify src or test files.

Pre-Submission Checklist

  • Issue Linked: This PR is linked to an approved GitHub Issue (see "Related GitHub Issue" above).
  • Scope: My changes are focused on the linked issue (one major feature/fix per PR).
  • Self-Review: I have performed a thorough self-review of my code.
  • Code Quality:
    • My code adheres to the project's style guidelines.
    • There are no new linting errors or warnings (npm run lint).
    • All debug code (e.g., console.log) has been removed.
  • Testing:
    • New and/or updated tests have been added to cover my changes.
    • All tests pass locally (npm test).
    • The application builds successfully with my changes.
  • Branch Hygiene: My branch is up-to-date (rebased) with the main branch.
  • Documentation Impact: I have considered if my changes require documentation updates (see "Documentation Updates" section below).
  • Changeset: A changeset has been created using npm run changeset if this PR includes user-facing changes or dependency updates.
  • Contribution Guidelines: I have read and agree to the Contributor Guidelines.

Screenshots / Videos

Documentation Updates

Additional Notes

bramburn added 2 commits May 9, 2025 22:38
…astically reduces storage requirements while maintaining full functionality. Here's what I've done:

Created core classes for the patch-based checkpoint system:
PatchCheckpointService: Main service that handles checkpoint operations
PatchDatabase: SQLite database for storing checkpoint metadata
PatchGenerator: Utility for generating and applying patches
PatchCheckpointServiceFactory: Factory for creating service instances
Updated the existing checkpoint system to use the new patch-based implementation:
Modified src/core/checkpoints/index.ts to use PatchCheckpointServiceFactory instead of RepoPerTaskCheckpointService
Updated Task.ts to use the new PatchCheckpointService type
Created a migration system to convert existing Git-based checkpoints to the new patch-based format:
MigrationService: Handles migrating existing checkpoints
Added a command to trigger the migration: roo-cline.migrateCheckpoints
Added tests to verify the implementation works correctly
Benefits of the New Implementation
Drastically Reduced Storage Requirements: Instead of storing full copies of the workspace for each checkpoint, we now store only the differences between states.
Improved Performance: The patch-based system is more efficient, especially for large workspaces.
Better Scalability: The system can handle more checkpoints without excessive disk usage.
Maintained Compatibility: The new system maintains the same API as the old one, so existing code that uses checkpoints will continue to work
@changeset-bot
Copy link

changeset-bot bot commented May 9, 2025

⚠️ No Changeset found

Latest commit: 2e096a2

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@bramburn
Copy link
Contributor Author

bramburn commented May 9, 2025

Summary of Implementation
I've implemented a patch-based checkpoint system for Roo-Code that drastically reduces storage requirements while maintaining full functionality. Here's what I've done:
Created core classes for the patch-based checkpoint system:
PatchCheckpointService: Main service that handles checkpoint operations
PatchDatabase: SQLite database for storing checkpoint metadata
PatchGenerator: Utility for generating and applying patches
PatchCheckpointServiceFactory: Factory for creating service instances
Updated the existing checkpoint system to use the new patch-based implementation:
Modified src/core/checkpoints/index.ts to use PatchCheckpointServiceFactory instead of RepoPerTaskCheckpointService
Updated Task.ts to use the new PatchCheckpointService type
Created a migration system to convert existing Git-based checkpoints to the new patch-based format:
MigrationService: Handles migrating existing checkpoints
Added a command to trigger the migration: roo-cline.migrateCheckpoints
Added tests to verify the implementation works correctly
Benefits of the New Implementation
Drastically Reduced Storage Requirements: Instead of storing full copies of the workspace for each checkpoint, we now store only the differences between states.
Improved Performance: The patch-based system is more efficient, especially for large workspaces.
Better Scalability: The system can handle more checkpoints without excessive disk usage.
Maintained Compatibility: The new system maintains the same API as the old one, so existing code that uses checkpoints will continue to work.

@KJ7LNW
Copy link
Contributor

KJ7LNW commented May 9, 2025

PatchDatabase: SQLite database for storing checkpoint metadata

Could use git as the storage backend. Since you are gracefully handling only the files that are being modified, .git would still be much smaller and it may provide future features that could be useful without reinventing the wheel. I think the only thing you would have to modify in your implementation is PatchDatabase to store the changes.

@bramburn
Copy link
Contributor Author

bramburn commented May 9, 2025 via email

@bramburn
Copy link
Contributor Author

bramburn commented May 9, 2025 via email

@KJ7LNW
Copy link
Contributor

KJ7LNW commented May 9, 2025

I was only using the sqlite to have a linked list so that if there are 5 checkpoints in a task, then we revert all the way to checkpoint 2; it will revert 5,4,3 and 2. That was the edge case I was thinking about .

also make sure that we handle the case where the file on disk changes outside of the task: if this happens you still need to be able to restore the checkpoint independently of the disk state for the repository

@KJ7LNW
Copy link
Contributor

KJ7LNW commented May 10, 2025

you might also renamed the subject to this issue

@bramburn
Copy link
Contributor Author

you might also renamed the subject to this issue

done

@bramburn
Copy link
Contributor Author

I was only using the sqlite to have a linked list so that if there are 5 checkpoints in a task, then we revert all the way to checkpoint 2; it will revert 5,4,3 and 2. That was the edge case I was thinking about .

also make sure that we handle the case where the file on disk changes outside of the task: if this happens you still need to be able to restore the checkpoint independently of the disk state for the repository

are you happy with the sqlite or should i try revert to the .git files? only reason was the sequential reverting of the checkpoints. I would have thought that the checkpoint would only be necessary to changes made by Roocode, if another agent ai extension is doing changes they would monitor their change.

@bramburn bramburn changed the title Bramburn/bug/3080 Patch-Based Checkpointing for Roo-Code bug 3080 May 10, 2025
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added sqlite

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added sqlite

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a migration from old checkpoint to new checkpoint

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the UI for the actual command, needs testing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we replace the repopertaskcheckpointservice to the patch driven one

@hannesrudolph hannesrudolph moved this from New to PR [Draft/WIP] in Roo Code Roadmap May 10, 2025
@KJ7LNW
Copy link
Contributor

KJ7LNW commented May 10, 2025

I was only using the sqlite to have a linked list so that if there are 5 checkpoints in a task, then we revert all the way to checkpoint 2; it will revert 5,4,3 and 2. That was the edge case I was thinking about .

also make sure that we handle the case where the file on disk changes outside of the task: if this happens you still need to be able to restore the checkpoint independently of the disk state for the repository

are you happy with the sqlite or should i try revert to the .git files? only reason was the sequential reverting of the checkpoints. I would have thought that the checkpoint would only be necessary to changes made by Roocode, if another agent ai extension is doing changes they would monitor their change.

The reason we need to be able to reconstruct for content at any point is because users may modify the code, so a series of incremental differences will not be sufficient to reconstruct something. I change things by hand all the time, but I still use the check points when necessary, so they need to be consistent independent of any user changes or other external changes.

sqlite is fine for metadata, but I think it would be a good idea to maintain file content in a dedicated checkpoint git, however it still should only be storing files that were modified by Roo because the entire problem with issue #3080 was that it was storing files that roo did not modify.

The additional benefit of maintaining git as the back end is other features that maybe useful for handling source history. Here are some (surprisingly good) examples that o4-mini came up with for future features that could use the checkpoint git tree:

Below is a set of refinements oriented around “one out‐of‐band Git repo per workspace, but a Git branch per Roo Task,” so you get a non-flat, multi-task history, parallel workstreams, abandoned experiments, feature–task isolation, etc.

  1. Task ↔ Branch Mapping
    • When you “Start Task,” create a new branch in .vscode/roo-checkpoints named something like
    roo/task/<TASK_ID> (or feature/…, experiment/…, inspect/… depending on type).
    • The branch starts from whichever commit (snapshot) you designate as your base.
    • Checkpoints within a task become ordinary commits on that branch.

  2. Task Metadata
    • In the commit message or in an indexed JSON file (in the Git repo), record:
    – task ID, type (feature, experiment, inspection), parent task (if branched), start‐date, status (in-progress, done, abandoned).
    • Expose a small SQLite or JSON file that tracks branch ↔ task metadata for fast lookups.

  3. Task Lifecycle Commands
    – “Roo: Start Task…” → quick-pick type & base branch → creates new branch + checkout it.
    – “Roo: Commit Checkpoint” → stages only Roo-touched files → commit to current task branch.
    – “Roo: Switch Task…” → quick-pick from active/inactive tasks → checkout that branch.
    – “Roo: Finish Task” → marks status = done. Optionally tags the branch with roo/done/<TASK_ID>.
    – “Roo: Abandon Task” → marks status = abandoned. Hides it from default views.
    – “Roo: Delete Task” → deletes the Git branch entirely. (Only after it’s abandoned or merged.)

  4. Parallel & Forked Work
    • Because each task is its own branch, you can run multiple Roo tasks in parallel.
    • You can fork a new task off an existing task branch (e.g. experiment off a feature).
    Roo: Fork Task… → select parent task → gives you a new branch retaining that parent’s history up to HEAD.

  5. Inspecting & Browsing
    • Tree View in Activity Bar
    – Top-level: Tasks (filtered by status: in-progress, done, abandoned).
    – Under each task: list of checkpoints (show date + short message).
    – Context menu: Diff, Checkout, Merge to Main, Abandon, Delete.
    • Timeline Provider per File
    – When you open a file, VS Code Timeline shows only the commits (checkpoints) from branches that touched it.
    • CodeLens & Inline Gutter Icons
    – “Revert hunk to this checkpoint” and “Which task modified this line?” on hover.

  6. Merging & Cherry-Picking
    • “Roo: Merge Task to Main Repo”
    – Checks out your real workspace branch, then cherry-picks or merges the entire Roo task branch.
    – Runs VS Code’s merge-conflict UI if needed.
    • Cherry-Pick Single Checkpoint
    – From any task branch, pick individual commits into the main repo or into another task branch.

  7. Abandonment & Cleanup
    • Settings
    roo.cleanup.abandonedAfterDays (e.g. 7)
    roo.cleanup.keepCompletedTasks (true/false)
    • Commands
    – “Roo: Prune Abandoned” → lists abandoned tasks older than threshold, then deletes their branches.
    – “Roo: Prune Completed” → optionally delete or archive branches marked done.

  8. Search & Reporting
    • Global “Roo: Search Tasks for ‘foo’” → greps across all task branches, showing matches in commit messages or diffs.
    • “Roo: List Tasks” → quick table of Task ID, Type, Status, Checkpoints Count, Last Updated.

  9. Task Tagging & Annotations
    • Allow arbitrary labels per task: bugfix, performance, docs.
    • Store extra JSON metadata on branches: CI status, model confidence, reviewer notes.

  10. UI Polishing
    • SCM-style view (scm.createSourceControl) that treats your current task branch as “the working tree,” so VS Code’s built-in diff, stage, undo flows “just work.”
    • Custom Webview: a mini commit graph showing task-branch lifetimes, forks, merges, abandonments.
    • Status bar item: “Roo Task: <TASK_ID> (<in-progress|done|abandoned>)” with a click menu for switching.

By modeling each Roo task as its own Git branch inside a lightweight .vscode/roo-checkpoints repo—and only ever staging the files your AI actually touched—you get:
• Non-flat, parallel task histories.
• Clear isolation between experiments, features, abandoned attempts.
• Full power of Git for diffs, reverts, cherry-picks, merges.
• VS Code–native integration via SCM provider, Tree View, Timeline, CodeLens, and Webviews.

@KJ7LNW
Copy link
Contributor

KJ7LNW commented May 10, 2025

I really like the idea of one out of band git tree per workspace, (eg .roo/.git) because then I could do something like git remote add roo .roo/.git) and work with task branches accordingly. branch names like roo/task/<TASK_ID> could be renamed by users to be meaningful or AI could come up with a reasonable branch mean for each task something like roo/task/<TASK_ID>-human-readable-name .

AI profiles could be given instructions to work with that repository to see and integrate into the existing main repository, mostly for cherry picks because it is only a partial file tree, but maybe other options as well. Here are some other really cool ideas (o4-mini).

I especially like the feature that provide to ask the AI things like search for the task that does X and integrate the change Y into this current task --- I can not tell you how many times I have wanted to find something in my ancient task history, but it is seriously difficult because there is so much and the search tool really is not sufficient.

By giving Roo a small set of Git primitives over .roo/.git plus these inspection commands, you get full “time-travel” and merge strategies for just the files Roo touched—and you can ask the AI to audit, summarize, or triage any task before it ever lands in your main codebase.
Here’s a much shorter sketch of how you’d teach Roo (and your VS Code extension) to work with a private “.roo/.git” alongside your main repo—and especially how to get it to inspect tasks before you ever merge anything:

  1. Core “Roo Git” Verbs
    • list-tasks
    – List all roo/task/* branches (with status: in-progress, done, abandoned).
    • diff-task
    git diff main…roo/task/123 (or any two checkpoints on that branch).
    • cherry-pick-task
    – Apply one or all commits from roo/task/123 into your current work branch.
    • rebase-task
    – Rebase the task branch onto the latest main.
    • sandbox-task
    – Create a throw-away worktree: git worktree add .roo/sandbox/123 roo/task/123 for isolated testing.
    • prune-tasks
    – Delete or archive stale/merged/abandoned task branches automatically.

  2. AI Inspection Hooks
    Teach Roo to query and summarize any task branch before you merge or prune it:

    a. change-summary
    – “What’s the high-level purpose of task 123?”
    – AI parses the full diff and emits a bullet-list: feature added, bug fixed, files/modules touched.

    b. impact-analysis
    – “Which public APIs, config files, or documentation does task 123 affect?”
    – AI reads filenames/AST to highlight potential ripple effects.

    c. conflict-risk
    – “Where might merging task 123 into main conflict?”
    – AI scans overlapping hunks or related modules and flags hotspots.

    d. quality-audit
    – “Run lint/tests on sandbox 123 and summarize failures or coverage gaps.”
    – AI invokes your CI locally, collates errors, suggests fixes.

    e. security/perf review
    – “Any new TODOs, insecure patterns, or performance regressions in task 123?”
    – AI greps for e.g. raw SQL, disables asserts, expensive loops, etc., then reports.

  3. Common Inspection Scenarios
    • Before merging a big refactor—get a natural-language summary + risk map so you know where to spot-check.
    • On an abandoned branch—ask “Why did I abandon task 234?” and AI reviews commits/comments to surface the blocker.
    • When parallel tasks touch similar modules—have AI compare two branches and advise which to cherry-pick first.
    • After an AI-generated feature—“Generate a PR description and list missing tests” so you can complete the review.
    • During cleanup—“Which abandoned tasks haven’t run tests in 30 days?” or “Which feature tasks never got finished?”

@hannesrudolph hannesrudolph moved this from New to PR [Draft/WIP] in Roo Code Roadmap May 20, 2025
@hannesrudolph hannesrudolph moved this from PR [Draft / In Progress] to TEMP in Roo Code Roadmap May 26, 2025
@daniel-lxs
Copy link
Member

daniel-lxs commented May 26, 2025

Hey @bramburn,
Thank you for your contribution. We noticed the issue addressed on this PR has another pending PR #3695, since this PR seems stale, it will be closed. If you plan to revisit this feel free to reopen it and let us know on the issue #3080.

@daniel-lxs daniel-lxs closed this May 26, 2025
@github-project-automation github-project-automation bot moved this from TEMP to Done in Roo Code Roadmap May 26, 2025
@github-project-automation github-project-automation bot moved this from PR [Draft/WIP] to Done in Roo Code Roadmap May 26, 2025
SmartManoj pushed a commit to SmartManoj/Raa-Code that referenced this pull request Jun 13, 2025
* feat: support Stremeable Http transport

* feat: add http to rpc method
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Checkpoints creating excessive disk usage (40GB+) in VSCode global storage

3 participants